Skip to content

[ROCm] Enable fused_silu_mul_block_quant on ROCm#38817

Open
gshtras wants to merge 4 commits intovllm-project:mainfrom
ROCm:fused_silu_mul_block_quant_rocm
Open

[ROCm] Enable fused_silu_mul_block_quant on ROCm#38817
gshtras wants to merge 4 commits intovllm-project:mainfrom
ROCm:fused_silu_mul_block_quant_rocm

Conversation

@gshtras
Copy link
Copy Markdown
Collaborator

@gshtras gshtras commented Apr 2, 2026

Another follow up for #32996
This time properly enabling the new kernel on ROCm instead of guarding

Include path changes are needed because the hipify script would ignore absolute include paths and multiple slightly different versions of the same header would end up being included, causing symbol redefinition errors.

Setting the device index globally in the test solves the IMA error from torch on ROCm

gshtras added 3 commits April 1, 2026 22:34
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables the silu_and_mul_per_block_quant kernel for ROCm by removing conditional compilation guards in csrc/ops.h and csrc/torch_bindings.cpp, and adjusting CMakeLists.txt. It also updates include paths to use relative addressing and refactors kernel tests and fusion passes to be more platform-agnostic by using current_platform helpers. One issue was identified in csrc/torch_bindings.cpp where a comment describing DeepSeek V3 GEMM was accidentally moved and is now incorrectly associated with the SiLU quantization operator.

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
@gshtras gshtras added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

2 participants